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We present a general class of unbiased improved estimators for physical observables in lattice gauge 
theory computations which significantly reduces statistical errors at modest computational cost. The 
error reduction techniques, referred to as covariant approximation averaging, utilize approximations 
which are covariant under lattice symmetry transformations. We observed cost reductions from the 
new method compared to the traditional one, for fixed statistical error, of 16 times for the nucleon 
mass at M n ~ 330 MeV (Domain- Wall quark) and 2.6-20 times for the hadronic vacuum polarization 
at M„ ~ 480 MeV (Asqtad quark). These cost reductions should improve with decreasing quark 
mass and increasing lattice sizes. 
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As non-perturbative computations using lattice gauge 
theory are applied to a wider range of physically interest- 
ing observables, it is increasingly important to find nu- 
merical strategies that provide precise results. In Monte 
Carlo simulations our reach to important physics is still 
often limited by statistical uncertainties. Examples in- 
clude hadronic contributions to the muon's anomalous 
magnetic moment [1] , nucleon form factors and structure 
functions [2], including nucleon electric dipolc moments 
[3-6], hadron matrix elements relevant to flavor physics 
(e.g., K — »■ tt-k amplitudes) [7], and multi-hadron state 
physics [8] , to name only a few. 

As a generalization of low-mode averaging (LMA) [9, 
10] , we present a class of unbiased statistical error reduc- 
tion techniques, utilizing approximations that are covari- 
ant under lattice symmetry transformations. LMA has 
worked well in cases where low eigenmodes of the Dirac 
operator dominate [11]: low energy constants in the e- 
regime [9, 12-15], pseudoscalar meson masses and decay 
constants [16-18], an so on. With a modest increase in 
computational cost, the generalized method can reduce 
statistical errors by an order of magnitude, or more, even 
in cases where LMA fails. 

Unlike LMA, we account for all modes of the Dirac 
operator, averaging over (most of) the lattice volume, 
with modest additional computational cost. The all-to- 
all methods [19, 20] implement this stochastically for the 
higher modes, while treating the low-modes exactly. For 
expectation values invariant under translations, statistics 
effectively increase by averaging over the whole lattice. 
The all-to-all method is advantageous when the stochas- 
tic noise introduced in the target observable is compa- 
rable to, or smaller than, the gauge field fluctuations of 
the ensemble [21], which typically holds only for many 
random source vectors per measurement. The error re- 
duction techniques presented here, which do not rely on 
stochastic noise, are potentially more effective, provided 
an inexpensive approximation can be found for the de- 
sired observable. 



In lattice gauge theory simulations an ensemble of 
gauge field configurations {U\, - ■ ■ , Un coii{ } is generated 
randomly, according to the Boltzmann weight, e~ s ^ u \ 
where S [U] is the lattice-regularized action. The expec- 
tation value of a primary, covariant observable, O, 

1 Ncan! / 1 \ 

<e>> = ^g 0[! /,] + o(^=), ,i, 

is estimated as the ensemble average, over a large number 
of configurations, N con f <~ O(100 — 1000). Here, we pri- 
marily consider observables made of fermion propagators 
Sf[U] computed on the background gauge configuration 
U. 

By exploiting lattice symmetry transformations g £ G, 
that transform U — > U 9 , a general class of variance re- 
duction techniques is introduced. First construct an ap- 
proximation 

£)(appx) t0 q which must fulml the following 

conditions, 

appx-1: (appx) should fluctuate closely with O, 

r = Corr(0,0( a PP x )) = ( AOAO '"""') « 1, 

and ((AO) 2 ) ss ((AO< a PP x ') 2 ) , where AX = X - 
(X). 

appx-2: the cost to compute 0( appx ) is smaller than O's, 
cost(0( ap P x )) < cost(O). 

appx-3: (0( appx ') is covariant under a lattice sym- 
metry transformation, g £ G, (0< appx ) [U 9 ]) = 
(0( app ^' 9 {U}} (in the examples below, a stronger 
condition holds: 0( appx ) is covariant on each con- 
figuration, rather than on average, 0( appx ) [U 9 ] = 
0( appx )' s [£/]). 

Note (appx) and (appx) < 9 refers to the approximations 
before and after applying a symmetry transformation g. 
Using O and 0< appx ) one can define an improved ob- 
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servable 



£)(imp) _ ^j(rost) _|_ ^j(appx) 



(2) 



£)(rest) _ q _ £)(appx) ^j(appx) _ 1 V"^ ^j(appx) 

' G N G ^ 



where an average over Nq symmetry transformations in 
G is taken. 

For appx-1, the statistical error of (CK 1111 ?)) is 



err 



(imp) 



errW2(l - r) + 



N G 



(3) 



TABLE I. LMA and AM A algorithms 



LMA algorithm 



1: Compute low-modes Vi of D H 



AMA algorithm 



1: if A cut # 0, JV oig > 

Compute low-mode Vi of Dh 



Set source b and G— invariant inital guess xq 



Compute exact S and 0[S] precisely (use deflation if Vi exits) 



4: Repeat for Slm in (6) 
and 0<"PP x ) = C>[S LM ] 



5: 0< rc3t ) = Q[S] - Q[5lm] 



4: Repeat for 5am in (8) 

and 0<"PP x ) = 0[Sam] using 
deflated CG (if A cut # 0) 



5: e>( rost > = 0{S\ - 0[S A 



Set shifted source b 9 and G— invariant inital 



Average O 



(appxj,g _ 



over g £ G to get Q^ ppx) 



guess 2?q 



7: Average O 



(appx),g _ 



over jgGto get Q^ ppx > 



which can be made smaller than the original (err) by 
a judicious choice of 0^ appx K The fluctuation from 
0( rcst ), the first term in (3), is suppressed due to r ~ 1, 
while the second term is reduced by 1/Nq without too 
much additional cost as required by appx-2 (correla- 
tions among O, Q^ appx \ and C)( a PP x )>9 have been ignored, 
which is a good approximation for noisy observables or 
large volume). Due to covariance, appx-3, it is easy 
to prove the ensemble averages of (primary observables) 

C ,(appx) jC) (appx), ff; and ^(appx) ^ ^ gQ ^ 

proved estimator (2) is unbiased, (£>( im P)) = (£>). 

The idea of exploiting covariance [9, 10] to improve 
statistical errors has a wider range of applicability than 
LMA, so in general we call it covariant approximation av- 
eraging (CAA). Several comments on CAA follow. From 
Eq. 3 the accuracy of the approximation 0( appx ) rj O 
(appx-1) should be precise enough so that the statis- 
tical error from 0( rest ) is below, say, one-half of the 
desired final precision. Too accurate an approximation 
wastes resources. In 0^ ,mp \ most of the statistical fluctu- 
ation is carried by 0^ appx \ which is reduced by averaging 
over Nc(^> 1) measurements with smaller cost (appx- 
2). Balance between these opposing parts of the method 
allows CAA to reduce statistical errors significantly while 
keeping the computational cost low. 

In the framework of CAA the best choice of approxi- 
mation depends on the target observables and lattice pa- 
rameters such as quark mass and volume. In principle, 
any set of lattice symmetries, G, can be used in CAA. 
We limit ourselves to the case of translation symmetries 
in the following examples. 

The first example is LMA. In LMA eigen-systems of 
the Hermitian Dirac operator are obtained for the part 
of the spectrum closest to zero, 



propagator, 



Slm(x,v) = ^2vi(x)fLM(k)vl(y), 



(6) 



i=0 



/lm(A) = -0(A cut - |A|). (7) 

N tot is the total dimension of the Dirac matrix. The 
recipe for LMA in terms of the CAA master Eq. (2) is 
shown in left column of Table I. Although LMA is par- 
ticularly good for observables dominated by low-modes, 
such as the single pion state for lighter fermion masses, 
LMA does not work so well for heavier hadrons or when 
the quark mass is heavier [16, 18] (see also [22] for depen- 
dence on parity of states and (non-)Hermiticity of Dirac 
operators). This is due to the truncation of the sum in 
(6), i.e., /lm(A) = for |A| > A cut . 

One could improve the above by constructing a poly- 
nomial for 1/A and using it to obtain a better (all-mode) 
approximation of the propagator above A cut : 



i=0 



/am (A) — 



- J A 



|A| < A c 



P„(A) |A| > A cut 



(8) 



(9) 



where P n (A) ss 1/A is a polynomial of degree n, From (8) 
and (9) , one computes the approximate propagator using 
Pu(Dh) in the subspace orthogonal to the eigenvectors 
below Acut, 



Sam = ]T v iY vt + P„(D H )(l - ]T v iV \), 



(10) 



D H v i = X i v i , {i= 1,2,- •• ,AT cig ), (4) 
0<|Ai|<|A 2 |<---<|A JVeig |=A cut , (5) 

which is then used to construct, through spectral de- 
composition, the low-mode approximation of the fermion 



with number of low-modes N e i g . In analogy to LMA, we 
refer to the above as all-mode averaging (AMA) . A recipe 
similar to LMA is shown in the right column of Table I. 

As emphasized in [9] approximate eigenvectors can be 
used in LMA (and AMA) to reduce the cost of this part of 
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the calculation. We have not done that as we find the cost 
of computing them exactly is not too burdensome and is 
partly recouped in the deflation of the Dirac operator. 

Among many different ways [23-25] to obtain P„(A), 
one of the easiest is to use the polynomial implicitly gen- 
erated by an iterative linear solver such as conjugate gra- 
dient (CG). For example (8) can be implemented as a 
CG solution using the low-mode approximation applied 
to the source vector b (the coefficients of P n depend on 
6) as the starting vector, Slm&j which is nothing but a 
deflated CG with iteration number set to the degree of 
the polynomial, n. One can either fix n (number of it- 
erations) or the CG residual vector stopping criterion. 
Either satisfies the covariance condition (appx-3). This 
particular construction of P„(L>#) is called the truncated 
solver method (TSM) [21]. The difference with AMA is 
that TSM is applied in [21] to a random source, and 
the unbiased result is guaranteed by stochasticity while 
AMA relies on covariance, so it does not need the random 
source. 

In [18] low-modes are utilized with Z 3 noise to compute 
many-to-all hadron correlation functions for variance re- 
duction. One may also choose N c { g = 0, A cut — in (9), 
i.e. not to use eigenvectors at all. This may be effective 
for heavier quark masses, but for lighter quarks one needs 
a larger degree polynomial for an accurate approximation 
and N cig > is likely more cost-effective. 

To ensure unbiasness one should check, on a few con- 
figurations, the covariance of the particular implementa- 
tion of the approximation {&ppx) [U g ] = 0^ x) ^[U] by 
computing the approximation explicitly on a transformed 
gauge field to compare with the original gauge field to see 
that they are equivalent to numerical precision. 

To compare the LMA and AMA methods, we use the 
2+1 flavor Domain- Wall fermion (DWF) ensemble gen- 
erated by the RBC/UKQCD collaboration [26] with lat- 
tice size 24 3 x 64, extra dimension size L s = 16, and 
Iwasaki gauge action (/_ — 2.13, or a -1 = 1.73 GeV). The 
low-modes of the Hermitian DWF Dirac operator are ob- 
tained using a 4D-even-odd-preconditioncd, shifted Lanc- 
zos algorithm [11] with accuracy \\(Dh — K)vi\\/\\vi\\ < 
10~ 12 . The eigcn- modes arc used for LMA as in Eqs. (6) 
and (7), to deflate the CG, and to evaluate the low-mode 
parts of both O and 0( a PP x ) , and similarly for AMA as 
in Eqs. (8) and (9). In this paper we compute 180 low- 
modes for light quark mass to — 0.01 and 400 low-modes 
for to = 0.005. 

We adopt translational symmetry on the lattice as G 
and take Nq propagator source locations, starting from 
the origin, separated by 12 lattice units in space and 16 
in time, and the total set of translations numbers Nq = 
2 3 x 4 = 32. For AMA, the stopping condition of the 
"sloppy CG" for our approximation is \\Dhx — < 
3 x 10~ 3 while it is 10~ 8 in [2]. Note that when using 
an even-odd preconditioned Dirac operator, LMA and 
AMA guarantee unbiased estimators for translations by 



TABLE II. Correlation function relative statistical error for 
A?conf = 109 (separated by 40 trajectories) and Nq = 32. 
Nucleon (N), pseudoscalar (PS), and vector (V) channels, 
m = 0.005. Gaussian smeared sink is used for the nucleon, 
others are point sinks. Gaussian smeared source is used for 
all channels. 
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an even number of sites (appx-3). We have explicitly 
checked this in our calculations. 

Table II lists the relative statistical errors for vari- 
ous hadronic two-point correlation functions computed 
using LMA, AMA, and the original CG method, for 
to = 0.005. All were obtained with the same Gaussian 
smeared sources and point (Gaussian) sinks for pseu- 
doscalar and vector (Nucleon) used in [2]. At short 
distance (t = 4), there is no improvement between the 
original and LMA cases, except in the pseudoscalar (PS) 
channel. This is because the contribution of higher modes 
is still important in the short-distance region. Although 
for LMA No could be taken as large as the lattice size 
with modest cost, we set Nq — 32 since larger Nq is 
not effective due to correlations between nearby gauge 
fields in our examples. On the other hand, AMA dra- 
matically reduces the errors (more than 4-6 x) for all 
channels (and different momenta) and for all distances. 
In this example the variance reduction by AMA comes 
almost entirely from the second term in Eq. (3) since 
r= Corr(0,e>( a PP x )) is very close to one (r > 0.9999 for 
to = 0.005), even though the residual stopping criterion 
used for £>( a PP x ) is loose (3 x 10~ 3 ). For LMA at short 
distance r ~ 0.9 so the error from 0( rcst ) is significant. 
We also confirm that for the PS channel both LMA and 
AMA yield improvement, with r > 0.997 even in the 
short distance region, as suggested previously for LMA 
using overlap fermions [17, 18]. For to = 0.01 r is some- 
what smaller (r > 0.99), so the contribution from 0( rest ) 
is more significant. Only 180 low-modes were used for 
to = 0.01. 

Figure 1 shows the nucleon effective mass using LMA 
and AMA for the data in Tab. II, and Tab. Ill compares 
these to an earlier high statistics study of nucleon struc- 
ture functions [2]. The right-most panel in Fig. 1 shows 
significant improvement of the effective mass plateau for 
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FIG. 1. Nucleon effective mass using LMA (middle) and 
AMA (right), m = 0.005. Unimproved calculation (left). 
See Tab. II for parameters. Colored bands denote fit mass 
and range. Gaussian sink. 
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TABLE III. Nucleon masses (GeV) using LMA, AMA and 
from data from a high statistics study [2]. See Table IV for 
costs. Gaussian (gauss) and point (pt) sinks. 





(im P ) ) Ng = 32 
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m 
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fit range 


LMA AMA 


High stat. 
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pt 


8-12 


1.1391(145) 1.1413(61) 


1.1561(104) 


0.005 


gauss 
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1.1305(143) 1.1420(58) 


1.1481(100) 


0.01 


pt 


9-15 


1.2446(164) 1.2363(59) 


1.2101(89) 


0.01 


gauss 


7-15 


1.2240(148) 1.2268(60) 


1.2169(93) 



AMA. Using the same fitting range, the precision of the 
nucleon mass attained with AMA is smaller by more 
than a factor of 1.5 compared to the high statistics study 
[2] where 3728 and 1424 measurements were made for 
rn = 0.005 and 0.01, respectively. The improved statis- 
tics make it easier to choose the fit range based on x 2 > as 
seen in Fig. 1. LMA for nucleon masses was examined in 
[16]. 

Most of the cost of AMA comes from the low-mode 
and sloppy CG parts of the approximation C>( a PP x ) (de- 
flating the Dirac operator significantly reduces the cost 
of computing 0( rest )), and the larger Ng, the lesser the 
relative cost of the former. The various costs for AMA 
in our examples are broken down in Table IV and com- 
pared to the high statistics study [2]. In the example 
using Gaussian sinks, AMA is roughly 16 and 5 times 
less expensive for roughly the same statistical error, for 
rn = 0.005 and 0.01, respectively. LMA is significantly 
less effective, 3.6 and 2.3 times less expensive. As No 
increases, AMA improves statistics with relatively little 
extra cost. For instance, for Ng — 64 AMA costs an 
additional 114, in units of the original propagator. The 
advantage of AMA clearly grows with increasing lattice 
size and decreasing quark mass. The cost of calculating 



FIG. 2. Hadronic vacuum polarization from [1] (squares) and 
using AMA (circles). AMA achieves the same statistical error 
as the original calculation in the range 0-1 GeV 2 for about 
2.6-20 times less computer time. See Tab. IV for details. 



the correlation functions in this example is negligible, but 
this may not be the case for more complicated observ- 
ables. Although disk space and CPU time for eigenvector 
I/O can be non-negligible, we ignore these as the costs 
strongly depend on the implementation details (e.g., we 
could (de)compress eigenvectors) and the features of the 
I/O systems used. 

Another impressive example of AMA is shown in Fig. 2, 
which depicts the hadronic vacuum polarization (HVP) 
from [1] and using AMA for roughly the same amount 
of computational resource (20 configurations, 1400 low- 
modes with accuracy — Aj)uj||/||?;j|| < 10~ 10 , Ng — 
708, and sloppy CG stopping residual criterion 10 -4 com- 
pared to 10~ 8 in [1]). The pion mass is m n = 476 MeV 
and lattice size 48 3 x 144. The HVP contribution to the 
muon's anomalous magnetic moment is sensitive to the 
low Q 2 region [1], so constraining the HVP in this re- 
gion is crucial to precisely extract the anomaly. In this 
test case (which was not optimized) , to achieve the same 
errors on the HVP in the range 0-1 GeV 2 as the origi- 
nal calculation required about 2.6-20 times less computer 
time. Interestingly, LMA actually increases the error in 
this case by about 2 — 3 x because the low-modes do not 
saturate the Ward-Takahashi identity. The stopping cri- 
terion for 0( a PP x ) can not be too low for the same reason, 
though our choice may have been too conservative. The 
costs are summarized in Tab. IV. We note that in this 
case the cost of constructing the low mode part of the 
propagator is roughly equivalent to the sloppy CG cost, 
and that here again the contraction costs are negligible. 

In this letter a new class of unbiased error reduction 
techniques is introduced, using approximations that are 
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TABLE IV. Computational cost. The unit of cost is one quark 
propagator without deflated CG, per configuration. Ng = 32 
for nucleon masses and 708 for HVP. The last column gives 
the cost to achieve the same error for each method, normalized 
to [2] (nucleon mass tun) and [1] (HVP) and scaled by the er- 
rors in Tab. III. HVP scaled costs are maximum and minimum 
in the range Q 2 = - 1 GeV 2 . For m = 0.005, in [2], non- 
relativistic spinors were used which means the scaled costs in 
this case were increased by two. The cost of £)^ ppx ' for AM A 
is split to show the sloppy CG and low-mode costs separately. 
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In [2] a doubled source was used to reduce this cost by two. 



covariant under lattice symmetries. This is a general- 
ization of low-mode averaging which reduces the statisti- 
cal error for observables that are not dominated by low- 
modes. We have shown through several numerical ex- 
amples that all-mode averaging is a powerful example 
of CAA, performing better than LMA and works well 
even in cases where LMA fails. In the examples given 
here, AMA reduced the cost by factors up to <~ 20, over 
conventional computations, and these factors will only 
increase for larger lattice sizes and smaller quark masses. 
The method has great potential for investigations of dif- 
ficult but important physics problems where statistical 
fluctuations still dominate the total uncertainty, like the 
nucleon electric dipole moment or hadronic contributions 
to the muon anomalous magnetic moment. Since CAA 
works without introducing any statistical bias (so long 
as condition appx-3 holds), there are many possibilities 
that also satisfy appx-1 and appx-2: One can construct 
£)(appx) using different lattice fermions and parameters 
(mass, L s (for DWF), boundary conditions and so on). 
(Oq PPX ^) can be measured on a larger number of gauge 
configurations, which is potentially advantageous for ob- 
servables dominated by gauge noise such as disconnected 
diagrams. One may also consider other types of approx- 
imations such as the hopping parameter expansion used 
in [21], or approximations at the level of hadronic Green's 
functions. 

Numerical calculations were performed using the RICC 
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